Photo-Realistic Talking-Heads from Image Samples
نویسندگان
چکیده
This paper describes a system for creating a photo-realistic model of the human head that can be animated and lip-synched from phonetic transcripts of text. Combined with a state-of-the-art text-to-speech synthesizer (TTS), it generates video animations of talking heads that closely resemble real people. To obtain a naturally looking head, we choose a “data-driven” approach. We record a talking person and apply image recognition to extract automatically bitmaps of facial parts. These bitmaps are normalized and parameterized before being entered into a database. For synthesis, the TTS provides the audio track, as well as the phonetic transcript from which trajectories in the space of parameterized bitmaps are computed for all facial parts. Sampling these trajectories and retrieving the corresponding bitmaps from the database produces animated facial parts. These facial parts are then projected and blended onto an image of the whole head using its pose information. This talking head model can produce new, never recorded speech of the person who was originally recorded. Talking-head animations of this type are useful as a front-end for agents and avatars in multimedia applications such as virtual operators, virtual announcers, help desks, educational, and expert systems.
منابع مشابه
Audio-Visual Unit Selection for the Synthesis of Photo-Realistic Talking-Heads
This paper investigates audio-visual unit selection for the synthesis of photo-realistic, speech-synchronized talking-head animations. These animations are synthesized from recorded video samples of a subject speaking in front of a camera, resulting in a photo-realistic appearance. The lip-synchronization is obtained by optimally selecting and concatenating variable-length video units of the mo...
متن کاملFace Analysis for the Synthesis of Photo-Realistic Talking Heads
This paper describes techniques for extracting bitmaps of facial parts from videos of a talking person. The goal is to synthesize photo-realistic talking heads of high quality that show picture-perfect appearance and realistic head movements with good lip-sound synchronization. For the synthesis of a talking head, bitmaps of facial parts are combined to form whole heads and then sequences of su...
متن کاملE-Partner: A Photo-Realistic Conversation Agent
An E-Partner is a photo-realistic conversation agent, which has a talking head that not only look photo-realistic but also can have a conversation with the user about a given topic. The conversation is multimedia-enriched in that the E-Partner presents relevant multimedia materials throughout the conversation. To address the challenges presented by the complex conversation domain and task, and ...
متن کاملSample-Based Synthesis of Photo-Realistic Talking Heads
This paper describes a system that generates photorealistic video animations of talking heads. First the system derives head models from existing video footage using image recognition techniques. It locates, extracts and labels facial parts such as mouth, eyes, and eyebrows into a compact library. Then, using these face models and a text-to-speech synthesizer, it synthesizes new video sequences...
متن کاملText Driven 3D Photo-Realistic Talking Head
We propose a new 3D photo-realistic talking head with a personalized, photo realistic appearance. Different head motions and facial expressions can be freely controlled and rendered. It extends our prior, high-quality, 2D photo-realistic talking head to 3D. Around 20-minutes of audio-visual 2D video are first recorded with read prompted sentences spoken by a speaker. We use a 2D-to-3D reconstru...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IEEE Trans. Multimedia
دوره 2 شماره
صفحات -
تاریخ انتشار 2000